{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "93e3f84f",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/Structured-LLMReranker-Lyft-10k.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "11b52622-f38a-4b3a-a916-c73619babb48",
   "metadata": {},
   "source": [
    "# Structured LLM Reranker Demonstration (2021 Lyft 10-k)\n",
    "\n",
    "This tutorial showcases how to do a two-stage pass for retrieval. Use embedding-based retrieval with a high top-k value\n",
    "in order to maximize recall and get a large set of candidate items. Then, use LLM-based retrieval\n",
    "to dynamically select the nodes that are actually relevant to the query using structured output.\n",
    "\n",
    "Usage of `StructuredLLMReranker` is preferred over `LLMReranker` when you are using a model that supports function calling.\n",
    "This class will make use of the structured output capability of the model instead of relying on prompting the model to rank the nodes in a desired format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e66e4c0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91b61e96-864c-4ed2-80d6-0ebfdbb57d5c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f630346f-fedd-40f2-8fb3-e052e219a873",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.core.postprocessor import StructuredLLMRerank\n",
    "\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "231c4418",
   "metadata": {},
   "source": [
    "## Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36f2294f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2025-03-20 15:13:23--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 1440303 (1.4M) [application/octet-stream]\n",
      "Saving to: ‘data/10k/lyft_2021.pdf’\n",
      "\n",
      "data/10k/lyft_2021. 100%[===================>]   1.37M  --.-KB/s    in 0.06s   \n",
      "\n",
      "2025-03-20 15:13:24 (23.9 MB/s) - ‘data/10k/lyft_2021.pdf’ saved [1440303/1440303]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/10k/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8a8fdeb2-939c-49dd-b8e6-0139d53a4fb6",
   "metadata": {},
   "source": [
    "## Load Data, Build Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee132733-b525-4aaf-81db-7eed19ded815",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Settings\n",
    "\n",
    "# LLM (gpt-4o-mini)\n",
    "Settings.llm = OpenAI(temperature=0, model=\"gpt-4o-mini\")\n",
    "\n",
    "Settings.chunk_overlap = 0\n",
    "Settings.chunk_size = 128"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88344ea2-540c-4b46-8a66-ed78870cb80a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\n",
    "    input_files=[\"./data/10k/lyft_2021.pdf\"]\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d478f416-8e85-49ae-9817-e4d78122a120",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectorStoreIndex.from_documents(\n",
    "    documents,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7e3d5f23-dfcd-458d-a9d3-dd66de0ab054",
   "metadata": {},
   "source": [
    "## Retrieval Comparisons"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47480c6d-8914-4562-a789-fd53a99a7afb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.retrievers import VectorIndexRetriever\n",
    "from llama_index.core import QueryBundle\n",
    "import pandas as pd\n",
    "from IPython.display import display, HTML\n",
    "from copy import deepcopy\n",
    "\n",
    "\n",
    "def get_retrieved_nodes(\n",
    "    query_str, vector_top_k=10, reranker_top_n=3, with_reranker=False\n",
    "):\n",
    "    query_bundle = QueryBundle(query_str)\n",
    "    # configure retriever\n",
    "    retriever = VectorIndexRetriever(\n",
    "        index=index,\n",
    "        similarity_top_k=vector_top_k,\n",
    "    )\n",
    "    retrieved_nodes = retriever.retrieve(query_bundle)\n",
    "\n",
    "    if with_reranker:\n",
    "        # configure reranker\n",
    "        reranker = StructuredLLMRerank(\n",
    "            choice_batch_size=5,\n",
    "            top_n=reranker_top_n,\n",
    "        )\n",
    "        retrieved_nodes = reranker.postprocess_nodes(\n",
    "            retrieved_nodes, query_bundle\n",
    "        )\n",
    "\n",
    "    return retrieved_nodes\n",
    "\n",
    "\n",
    "def pretty_print(df):\n",
    "    return display(HTML(df.to_html().replace(\"\\\\n\", \"<br>\")))\n",
    "\n",
    "\n",
    "def visualize_retrieved_nodes(nodes) -> None:\n",
    "    result_dicts = []\n",
    "    for node in nodes:\n",
    "        node = deepcopy(node)\n",
    "        node.node.metadata = {}\n",
    "        node_text = node.node.get_text()\n",
    "        node_text = node_text.replace(\"\\n\", \" \")\n",
    "\n",
    "        result_dict = {\"Score\": node.score, \"Text\": node_text}\n",
    "        result_dicts.append(result_dict)\n",
    "\n",
    "    pretty_print(pd.DataFrame(result_dicts))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8bedc4f-444b-4233-9b72-728e3cfbe056",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_nodes = get_retrieved_nodes(\n",
    "    \"What is Lyft's response to COVID-19?\", vector_top_k=5, with_reranker=False\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e85e656b-9377-4640-a10d-a6655afd82bd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "      <th>Text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.870327</td>\n",
       "      <td>Further, COVID-19 has and may continue to negatively impact Lyft’s ability to conduct rental operationsthrough the Express Drive program and Lyft Rentals as a result of restrictions on travel, mandated closures, limited staffing availability, and other factors relatedto COVID-19. For example, in 2020, Lyft Rentals temporarily ceased operations, closing its rental locations, as a result of COVID-19.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.858815</td>\n",
       "      <td>The Company has adopted a number of measures in response to the COVID-19 pandemic including, but not limited to, establishing new health and safetyrequirements for ridesharing and updating workplace policies. The Company also made adjustments to its expenses and cash flow to correlate with declines in revenuesincluding headcount reductions in 2020. Refer to Note 17 “Restructuring” to the consolidated financial statements for information regarding the 2020 restructuring events.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.857701</td>\n",
       "      <td>•The responsive measures to the COVID-19 pandemic have caused us to modify our business practices by permitting corporate employees in nearly all of ourlocations  to  work  remotely,  limiting  employee  travel,  and  canceling,  postponing  or  holding  virtual  events  and  meetings.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.855108</td>\n",
       "      <td>The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing  and  updating  workplace  policies.  We  also  made  adjustments  to  our  expenses  and  cash  flow  to  correlate  with  declines  in  revenues  including  headcountreductions in 2020.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.854779</td>\n",
       "      <td>In 2020, Flexdrive also began to waive rental fees for drivers who are confirmed to have testedpositive for COVID-19 or requested to quarantine by a medical professional, which it continues to do at this time. Further, Lyft Rentals and Flexdrive have facedsignificantly higher costs in transporting, repossessing, cleaning, and17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "visualize_retrieved_nodes(new_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ba150e2-c4e7-4404-b8e1-1603c2b346d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_nodes = get_retrieved_nodes(\n",
    "    \"What is Lyft's response to COVID-19?\",\n",
    "    vector_top_k=20,\n",
    "    reranker_top_n=5,\n",
    "    with_reranker=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7541606f-6424-470b-987c-a986ac0a7cf8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "      <th>Text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10.0</td>\n",
       "      <td>The Company has adopted a number of measures in response to the COVID-19 pandemic including, but not limited to, establishing new health and safetyrequirements for ridesharing and updating workplace policies. The Company also made adjustments to its expenses and cash flow to correlate with declines in revenuesincluding headcount reductions in 2020. Refer to Note 17 “Restructuring” to the consolidated financial statements for information regarding the 2020 restructuring events.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.0</td>\n",
       "      <td>We have adopted several measures in response to the COVID-19 pandemic including, but not limited to, establishing new health and safety requirements forridesharing, and updating workplace policies. We also made adjustments to our expenses and cash flow to correlate with declines in revenues including the transaction withWoven  Planet  completed  on  July  13,  2021  and  headcount  reductions  in  2020.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10.0</td>\n",
       "      <td>•manage our platform and our business assets and expenses in light of the COVID-19 pandemic and related public health measures issued by various jurisdictions,including  travel  bans,  travel  restrictions  and  shelter-in-place  orders,  as  well  as  maintain  demand  for  and  confidence  in  the  safety  of  our  platform  during  andfollowing the COVID-19 pandemic;•plan for and manage capital expenditures for our current and future offerings,</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing  and  updating  workplace  policies.  We  also  made  adjustments  to  our  expenses  and  cash  flow  to  correlate  with  declines  in  revenues  including  headcountreductions in 2020.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9.0</td>\n",
       "      <td>The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing  and  updating  workplace  policies.  We  also  made  adjustments  to  our  expenses  and  cash  flow  to  correlate  with  declines  in  revenues  including  headcountreductions in 2020.56</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "visualize_retrieved_nodes(new_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb88c0bb-4d3f-4426-b2be-66b0d5635abb",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_nodes = get_retrieved_nodes(\n",
    "    \"What initiatives are the company focusing on independently of COVID-19?\",\n",
    "    vector_top_k=5,\n",
    "    with_reranker=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3c4b3962-2873-40b3-9f50-58cf7685454a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "      <th>Text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.813871</td>\n",
       "      <td>•The responsive measures to the COVID-19 pandemic have caused us to modify our business practices by permitting corporate employees in nearly all of ourlocations  to  work  remotely,  limiting  employee  travel,  and  canceling,  postponing  or  holding  virtual  events  and  meetings.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.810687</td>\n",
       "      <td>•manage our platform and our business assets and expenses in light of the COVID-19 pandemic and related public health measures issued by various jurisdictions,including  travel  bans,  travel  restrictions  and  shelter-in-place  orders,  as  well  as  maintain  demand  for  and  confidence  in  the  safety  of  our  platform  during  andfollowing the COVID-19 pandemic;•plan for and manage capital expenditures for our current and future offerings,</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.809540</td>\n",
       "      <td>The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing  and  updating  workplace  policies.  We  also  made  adjustments  to  our  expenses  and  cash  flow  to  correlate  with  declines  in  revenues  including  headcountreductions in 2020.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.806794</td>\n",
       "      <td>the timing and extent of spending to support ourefforts to develop our platform, actual insurance payments for which we have made reserves, measures we take in response to the COVID-19 pandemic, our ability tomaintain demand for and confidence in the safety of our platform during and following the COVID-19 pandemic, and the expansion of sales and marketing activities.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.805533</td>\n",
       "      <td>•anticipate and respond to macroeconomic changes and changes in the markets in which we operate;•maintain and enhance the value of our reputation and brand;•effectively manage our growth and business operations, including the impacts of the COVID-19 pandemic on our business;•successfully expand our geographic reach;•hire, integrate and retain talented people at all levels of our organization;•successfully develop new platform features, offerings and services to enhance the experience of users; and•right-size our real estate portfolio.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "visualize_retrieved_nodes(new_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1dcc798a-3940-4426-8110-d9a3f7dc5a68",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_nodes = get_retrieved_nodes(\n",
    "    \"What initiatives are the company focusing on independently of COVID-19?\",\n",
    "    vector_top_k=40,\n",
    "    reranker_top_n=5,\n",
    "    with_reranker=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38bae391-7dca-4f8d-a90e-ada1beddcd2e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "      <th>Text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9.0</td>\n",
       "      <td>Even as we invest in the business, we also remain focused on finding ways to operate more efficiently.To advance our mission, we aim to build the defining brand of our generation and to advocate through our commitment to social and environmental responsibility.We  believe  that  our  brand  represents  freedom  at  your  fingertips:  freedom  from  the  stresses  of  car  ownership  and  freedom  to  do  and  see  more.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>8.0</td>\n",
       "      <td>We have also invested in sales and marketing to grow our community,cultivate a differentiated brand that resonates with drivers and riders and promote further brand awareness. Together, these investments have enabled us to create a powerfulmultimodal platform and scaled user network.Notwithstanding the impact of COVID-19, we are continuing to invest in the future, both organically and through acquisitions of complementary businesses.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8.0</td>\n",
       "      <td>As a result, we may introduce significantchanges  to  our  existing  offerings  or  develop  and  introduce  new  and  unproven  offerings.  For  example,  in  April  2020,  we  began  piloting  a  delivery  service  platform  inresponse to the COVID-19 pandemic.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6.0</td>\n",
       "      <td>•anticipate and respond to macroeconomic changes and changes in the markets in which we operate;•maintain and enhance the value of our reputation and brand;•effectively manage our growth and business operations, including the impacts of the COVID-19 pandemic on our business;•successfully expand our geographic reach;•hire, integrate and retain talented people at all levels of our organization;•successfully develop new platform features, offerings and services to enhance the experience of users; and•right-size our real estate portfolio.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>6.0</td>\n",
       "      <td>has  been  critical  to  our  success.  We  face  a  number  ofchallenges that may affect our ability to sustain our corporate culture, including:•failure to identify, attract, reward and retain people in leadership positions in our organization who share and further our culture, values and mission;•the increasing size and geographic diversity of our workforce;•shelter-in-place orders in certain jurisdictions where we operate that have required many of our employees to work remotely,</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "visualize_retrieved_nodes(new_nodes)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama-index-cXQhuK8v-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
